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1  Statement 

In  this  STIR  project,  we  achieved  three  major  contributions  in  the  application  of  sparse  and  low-rank  repre¬ 
sentation  in  geometric  3D  modeling  of  urban  structures  and  high-dimensional  pattern  recognition  at  large. 

1 .  We  proposed  a  novel  algorithm  to  effectively  detect  geometry-rich  low-rank  patterns  in  natural  images. 

2.  We  extended  a  sparse  representation-based  classification  framework  to  the  small- sample- set  scenario. 
We  demonstrated  the  utility  of  the  new  technique  in  a  challenging  problem  of  single-sample  face 
recognition. 

3.  We  studied  the  problem  of  accelerating  sparse  optimization  solvers.  Our  results  are  a  set  of  numerical 
solvers  that  have  achieved  the  state-of-the-art  performance  in  both  the  speed  and  the  accuracy  of 
recovering  high-dimensional  structured  sparse  signals. 

2  Geometric  Segmentation  of  Natural  Images 

In  the  literature  of  image-based  3D  modeling  and  reconstruction,  several  types  of  global  or  semi-global 
image  features  have  been  recently  proposed.  In  urban-scene  modeling,  symmetric  texture  regions  are  widely 
used.  Using  the  virtual  views  of  symmetric  patterns,  their  3D  orientation  can  be  readily  estimated  from  just  a 
single  image.  Another  type  of  geometric  features  used  in  3D  modeling  are  homogeneous  color  regions  such 
as  superpixels  whose  orientation  under  perspective  projection  is  consistent  with  that  of  some  global  planar 
structures  in  space.  Finally,  in  object  recognition  and  segmentation,  various  types  of  object  part-based 
regions  that  contain  rich  semantic  information  have  been  proposed. 

More  recently,  motivated  by  the  emerging  theory  of  Robust  PCA,  a  new  type  of  invariant  feature  has  been 
proposed,  called  transform-invariant  low -rank  texture  (TILT).  The  fundamental  idea  of  TILT  is  that  image 
texture  that  represents  regular  or  repetitive  3D  shapes  in  space  is  often  low  rank,  when  the  texture  region  is 
represented  as  a  matrix  of  its  pixel  values.  However,  under  camera  perspective  distortion  and  potential  pixel 
corruption,  the  matrix  representation  of  the  texture  in  the  image  space  exhibits  much  higher  rank  compared 
to  its  canonical  representation ,  i.e.,  the  texture  observed  under  orthographic  projection  and  free  of  pixel 
corruption.  Therefore,  the  rank  of  the  texture  region  can  be  used  as  part  of  an  objective  function  to  rectify 
the  underlying  image  distortion.  This  new  approach  suggests  that  we  can  obtain  accurate  geometric  models 
of  many  urban  objects,  such  as  buildings,  hallways,  road  signs,  and  humans,  without  relying  on  extraction 
of  any  traditional  local  features  (as  shown  in  Figure  1).  More  importantly,  TILT  features  can  be  shown  to 
be  robust  to  camera  perspective  distortion  and  can  also  compensate  a  moderate  amount  of  pixel  corruption, 
which  are  the  main  advantages  compared  to  other  existing  global  invariant  features. 


Figure  1:  Examples  of  manually  labeled  image  patterns  that  are  extracted  as  TILT  features.  Top:  Initializa¬ 
tion  of  the  feature  locations  as  the  red  bounding  boxes,  and  the  final  orientation  of  the  feature  as  the  green 
bounding  boxes.  The  TILT  features  compensate  the  perspective  distortion.  Bottom:  Canonical  representa¬ 
tion  of  the  low-rank  matrices. 
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In  this  project,  we  have  made  notable  contributions  to  extending  the  utility  of  TILT  features  in  image 
segmentation  and  3D  reconstruction.  We  have  proposed  a  novel  algorithm  to  effectively  recover  a  top- 
down  hierarchical  TILT  feature  representation  in  urban  images  [1].  Compared  to  traditional  image  feature 
detection  techniques,  Robust  PCA  is  still  an  very  expensive  operator  applied  to  high-resolution  images. 
Therefore,  naive  approaches  of  using  sliding  windows,  random  sampling,  or  fixed  grids  are  not  tractable  in 
finding  low-rank  texture  regions.  The  algorithm  in  [1]  first  utilizes  the  canonical  low-rank  matrix  represen¬ 
tation  of  image  texture  and  effectively  segments  urban  images  into  a  geometric  layer  and  a  non-geometric 
layer,  as  shown  in  Figure  2.  In  the  geometric  layer,  a  multi-scale  TILT  detection  process  is  applied  to  fur¬ 
ther  group  different  scales  of  TILT  features  into  complexes,  each  of  which  represents  a  more  global  texture 
facade  that  is  robust  to  camera  distortion,  foreground  occlusion,  and  non-Lambertian  texture.  The  algorithm 
is  also  capable  of  rejecting  noisy  outlying  TILT  features.  Figure  3  demonstrates  some  representative  results 
of  the  multi-scale  TILT  detection  algorithm. 


Figure  2:  Segmentation  of  a  natural  image  (left)  into  a  geometric  layer  (middle)  and  a  non-geometric  layer 
(right). 


Figure  3:  Examples  of  multi-scale  TILT  detection  (top)  and  clustering  into  facades  (bottom).  The  local 
coordinate  frames  are  superimposed  to  indicate  the  estimated  surface  orientation  and  the  green  arrows  indi¬ 
cate  the  normal  vectors.  The  estimation  is  robust  to  large  perspective  distortion,  vegetation  occlusion,  and 
non-Lambertian  surfaces. 
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3  Single-Sample  Face  Recognition  vis  Sparse  Illumination  Transfer  (SIT) 


Single-sample  face  alignment  and  recognition  represents  an  important  step  towards  practical  face  recogni¬ 
tion  solutions  using  images  collected  in  the  wild  or  on  the  Internet.  We  contend  that  the  problem  can  be 
solved  quite  effectively  by  a  simple  yet  elegant  algorithm.  The  key  observation  is  that  one  sample  per  class 
mainly  deprives  the  algorithm  of  an  illumination  subspace  model  for  each  individual  class.  We  showed 
in  [2]  that  a  sparse  illumination  transfer  (SIT)  dictionary  can  be  constructed  to  compensate  the  lack  of  the 
illumination  information  in  the  training  set. 

Due  to  the  fact  that  most  human  faces  have  similar  shapes,  only  one  subject  is  often  sufficient  to  provide 
images  of  different  illumination  patterns,  although  adding  more  subjects  may  further  improve  the  accuracy. 
The  subject(s)  for  illumination  transfer  can  be  selected  outside  the  set  of  training  subjects  for  recognition. 
Finally,  we  show  that  the  other  image  nuisances,  including  pose  variation  and  image  corruption,  can  be 
readily  corrected  by  a  single  reference  image  of  arbitrary  illumination  condition  per  class  combined  with 
the  SIT  dictionary.  The  SIT  dictionary  also  does  not  need  to  know  the  information  of  any  possible  facial 
corruption  for  the  algorithm  to  be  robust.  To  the  best  of  our  knowledge,  this  work  is  the  first  to  propose 
a  solution  to  perform  facial  illumination  compensation  in  the  alignment  stage  and  illumination  and  pose 
transfer  in  the  recognition  stage. 

In  terms  of  the  algorithm  complexity,  the  construction  of  the  SIT  dictionary  is  extremely  simple  when 
the  illumination  data  of  the  SIT  subject(s)  are  provided,  and  it  does  not  necessarily  involve  any  dictionary 
learning  algorithm.  The  algorithm  is  also  fast  to  execute  in  the  alignment  and  recognition  stages  compared 
to  the  other  sparse-representation  classifier  (SRC)-type  algorithms  because  a  sparse  optimization  solver  is 
now  faced  with  much  smaller  linear  systems. 

Our  extensive  experiments  have  demonstrated  that  the  new  algorithms  significantly  outperform  the  ex¬ 
isting  algorithms  in  the  single-sample  regime  and  with  less  restrictions.  In  particular,  the  face  alignment 
accuracy  is  comparable  to  that  of  the  well-known  Deformable  SRC  algorithm  using  multiple  training  im¬ 
ages;  and  the  face  recognition  accuracy  significantly  exceeds  those  of  the  Extended  SRC  algorithms  using 
hand  labeled  alignment  initialization.  A  comparison  on  the  accuracy  of  single-sample  face  recognition  via 
SIT  with  Deformable  SRC  (DSRC)  and  Misalignment  Robust  Representation  (MRR)  is  shown  in  Table  1 
using  the  standard  Multi-PIE  database 

Table  1:  Single-sample  alignment  +  recognition  accuracy  on  Multi-PIE  database. 


Method 

Session  1  (%)  Session  2  (%) 

DSRC 

MRR 

36.1  35.7 

46.2  34.6 

SIT 

79.9  65.7 

UC  Berkeley  has  filed  a  US  patent  application  about  the  new  technique. 


4  Acceleration  of  Sparse  Optimization  Algorithms 

In  this  project,  we  studied  the  speed  and  scalability  of  sparse  optimization  algorithms  in  solving  t\ -minimization 
type  problems.  Motivated  by  the  emerging  compressive  sensing  theory,  the  sparsity- seeking  property  of 
min  optimization  has  been  shown  to  have  applications  in  many  areas  such  as  geophysics,  speech  recognition, 
image  compression,  processing,  and  enhancement,  sensor  networks,  and  computer  vision.  In  particular,  the 
applications  discussed  in  the  previous  sections  are  indeed  also  examples  of  sparse  optimization  problems. 

While  the  ^i-min  can  be  cast  as  a  linear  program  and  readily  solved  by  classical  convex  optimization 
methods,  their  computational  complexity  is  often  too  high  for  large-scale,  high-dimensional  image  data.  In 
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light  of  a  large  number  of  real  applications  in  compressive  sensing,  many  new  efficient  algorithms  have  been 
proposed  over  the  past  decade. 

Our  first  contribution  is  a  novel  ^i-min  solution  based  on  a  classical  convex  optimization  technique 
known  as  augmented  Lagrangian  methods  (ALM).  In  our  work  [3],  we  have  thoroughly  compared  the  ALM 
algorithms  with  several  state-of-the-art  acceleration  techniques  for  £i-min  problems,  which  include  two 
classical  solutions  using  interior-point  method  and  Homotopy  method,  and  several  first-order  methods  in¬ 
cluding  proximal-point  methods,  parallell  coordinate  descent,  approximate  message  passing,  and  templates 
for  convex  cone  solvers  (TFOCS). 

To  concretely  demonstrate  the  performance  of  ALM  and  the  other  algorithms,  we  have  compiled  a 
benchmark  using  both  synthetic  data  and  real  high-dimensional  image  data  in  face  recognition.  The  ALM 
algorithms  compare  favorably  among  a  wide  range  of  state-of-the-art  ^i-min  algorithms,  and  more  impor¬ 
tantly  are  very  suitable  for  large-scale  face  recognition  and  alignment  problems  in  practice.  To  aid  peer 
evaluation,  ah  algorithms  discussed  in  this  work  have  been  made  publicly  available  on  our  website  as  a 
MATLAB  toolbox. 
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